NPS ARCHIVE 
1964 

WATSON, J. 




A STUDY OF THE USEFULNESS AND UMITATIONS 
CF MENTAL MEASUREMENT IN THE ARMED SERVICES 



JUNIOR JENNINGS WATSON 



WWM 











OUOLEY KNOX L.Bn/^RY 
H>k\/AL PO-'^TGRADOAl L SChOCvi 
M(.M>JTFRE> CA 



1 f.»^rr 

, . S. s ,v,.t I'O t 
fi<i t« t N . ^ r 1 > 1 M 



* 




» 



i 



-' 4 , 









I 



\ 



A STUDY OF THE USEFULNESS 
AND LIMITATIONS OF MENTAL 
MEASUREMENT IN THE ARMED SERVICES 






Junior Jennings Watson 






tft, ^ » 4l ^ 



rroaii‘ ^ainrutt* 



A STUDY OF THE USEFULNESS 



AND LIMITATIONS OF MENTAL 
MEASUREMENT IN THE ARMED SERVICES 



by 

Junior Jennings ^tson 
Lieutenant, SC, U.S.N. 



Submitted in fulfillment of the 
requirements for the course 

INDIVIDUAL RESEARCH 
Wi-kOO 

United States Naval Postgraduate School 
Monterey, California 



19 6 4 



L?brary 

U. S. Naval Postgraduate ScTiooT 
Monterey, Caltfoniia 



A STUDY OF THE USEFULNESS 
AND LIMITATIONS OF MENTAL 
MASUREMENT IN THE ARMED SERVICES 

by 

Junior Jennings Watson 

This work is accepted as fulfilling 
the requirements for the coxirse 
INDIVIDUAL RESEARCH 
Mn-400 

United States Naval Postgraduate School 




I 



ABSTRACT 



Mental uieasurerjent in the Armed Forces is en absolute necessity. 

Two interrelated existing problems must be resolved before; mental 
noasurement can be used most offectively . First, a dctoimiination of 
which jobs or occupt. oions need co be filled in the Armed Forces is 
necessary, and, additionally, performance must be measured. The 
second problem is testing or mental measurement. Test development 
must be centered around a job or occu>pation or a series of jobs or 
occupations and correlated ag-:;.nst job or occupation performance. 

A partial reviev; of the literature on military testing indicates 
that testing is being conducted without resolving the first problem 
in any sound testing program. Additionally , there are indications 
that correlation studies of current!; used tests are frequently not 
conducted in an unbiased scientific manier. High correlation coefficients, 
no matter hov; they are obtained, have possibly become the ultimate 
goal of military testing. 







I 






TABLE OF CONTENTS 



CHAPTER PAGE 

I, HISTORICAL BACKGROUND 1 

Basic military needs »»»»(.....<> o » 2 

Advances in military testing ........ 3 

Selected Navy oests eeooooaoooooo 6 

Enlisted basic test battery ...... o . 6 

Other Navy enlisted tests B 

Basic test for Naval officer personnel , „ 9 

II. CURRENT USES OF TESTS 11 

Military application ... 12 

Uses of Navy enlisted tests ........ 12 

Uses of Navy officer tests 17 

III. LIMITATIONS OF TESTING 19 

Test construction ......... 20 

Reliability of tests ....... 26 

Validity of tests ... ........... 2S 

IV. NECESSITY FOR TEST LMPROVEMENT ........ 29 

Construction ......... 29 

Validity ....... 31 

V. TOOLS FOR THE FUTURE 39 

Research required ......... 40 

♦ 

Selection for training ........... 42 

Economics of testing . 44 

ii 



CHAPTER 

VI, CONCLUSIONS AND RECOMMENDATIONS 
Conolusion3 

Recommendations , . , . , . « 



PAGE 

AS 

AS 

A9 



BIBLIOGRAPHY ... 

APPENDIX A . . , . 

APPENDIX B . 

APPlliNDXX C***90000000 



o 



o 



52 

54 



55 



56 



iii 



CHAPTER I 



HISTORICAL BACKGROUND 

Military commanders have for many centuries judged the 
mental capacity of their subordinates. This was and is 
today one of the principle elements military coirraanders must 
consider when assigning personnel. Selection and assignment 
of personnel during the nineteenth century, a period of 
relatively small regional wars, was based upon political 
considerations, judgement of mental capacities, technical 
know-how, and necessity in that order. 

Twentieth century global warfare required many more men 
on the battle field, each assigned so that his contribution 
to the war effort v;ould be maximized. Military commanders 
found it impossible to evaluate millions of men on a personal 
judgement basis when the United States entered World War I. 
The American Psychological Association appointed a committee 
to study this problem. A method of determinine the 
intellectual level of millions of men was desired. This was 
considered necessary for the rapid classification, training, 
and assignment of men to different types of service in order 
to save time and to properly utilize available human 
resources. 

Prior to World War I, Psychological testing had not, in 
general, attempted to measure individual differences for the 



purpose of personnel placement « Mental measurement tests 
had been developed for use in clinical examination of 
psychiatric patients. In addition, Cattel, in 1^90, 
described a series of tests which were being used to 
determine the intellectual level of college students, 
although his ideas that "a measure of intellectual functions 
could be obtained through tests of sensory discrimination and 
reaction time" appears to be somewhat in error. ^ 



A group test was desired to handle the classification 
problems of World War I. Army psychologists drew upon all 
available test material, much of which was unproven as to 
its usefulness, and developed the Army Alpha and Beta tests. 
These tests were well suited for group use , where only a 
very general classification was required. The Alpha test 
was designed for literate groups while the Beta was for use 
with those who were not literate in the English language. 
However, the Beta test proved to be less valid than the 
Alpha, but it was sufficiently discriminate for emergency 
use. An additional group test, the "Personal Date Sheet" was 
used in World War I to screen out those individuals viith 



A. BASIC MILITARY NEEDS 



Anne 

Company, Ne 




MacMillan 



2 



Psychological difficulties. Psychologists also developed 
tests of special or specific abilities that proved to be 
moderately useful. Army psychologists in collaboration with 
their civilian contemporaries were able to develops a group 
intelligence test that contributed immeasurably to the 
solution of the emergency classification problems and 
ultimate success of the war effort. 

Group tests developed by the Army in World V/ar I were 
eagerly accepted for civilian use after the war. Group tests 
became the panacea in personnel selection and placement. 

This movement encompassed peoples of all ages and groups. 
Studies of special groups were undertaken for various reasons. 
The use of group tests became indiscriminate and when the 
results failed to meet expectations much hostility and 
skepticism developed. Much of this hostility and skepticism 
is still present, one and a half generations later, for 
reasons which were and may still be well founded. 

B. ADVANCES IN MILITARY TESTING 

During the interim between World Wars I and II, and the 
advent of group testing, many advances were made in the use 
of mental measurement tests. The Navy’s Bureau of Navigation 
organized a personnel testing program as a part of its 
training division in 1924. A General Classification Test 
was used at training stations to select enlisted men for 



3 



Navy schools. Later, this same test was used as a screening 
device at recruiting stations. Other tests vrere also 
introduced and by December 19^1 the following tests were in 
sceneral use at recruit training stations; "General 
Classification, Mechanical Aptitude Test, Arithmetic Test, 
English Test, Spelling Test, and Radio Aptitude Test."^ 

These tests had served well during peacetime when the 
ratio of selection to applicants for Naval service vras 
rather low. However, when this ratio was raised as mass 
mobilization became a necessity, the tests were found to be 
grossly inadequate for selection purposes. There was little 
differentiation between good men and their capabilities in 
various rates. Training schools found that many men enrolled 
had little capability in their assigned specialty. Local 
testing programs developed at many stations in an attempt to 
overcome these difficulties. 

By May 1942, the enormity of the personnel testing 
program and its concomitant problems was recognized, A 
request for assistance was made to the Office of Scientific 
Research and Development. As a result of this and other 
developments, a two pronged attack was launched to resolve 
the problems related to testing as soon as possible. 



2 

Personnel Research and Test Development in the Bureau 
of Naval Personnel . Ed,.' Dewey B. Stuit, Princeton University 
Tress l947, P« 6, 



4 



The problems were essentially divided into tv/o parts. 

First, since the present tests revealed a lack of validity, 

tests had to be developed which were valid. Second, little 

was known about the requirements of Navy training on a mass 

basis. Knowledge conceraing the second aspect of this 

problem was necessary before the first part could be resolved. 

This emergency was met in much the same fashion as the 

identical problem of selection, classification, and placement 

was met in V/orld War I, Both the Army and Navy faced these 

problems and, as before, psychologists and personnel 

officials of both services pooled their resources, procured 

civilian assistance and civilian tests, and tests began to 

improve in validity and continued to do so after late 1942 

for the remainder of the war,^ 

Following World War II, in 1946, a permanent research 

organization was approved to;^ 

Undertake a coordinated program of personnel research 
and test development centered around the major 
personnel problems of the NAVY.,, to conduct studies 
on personnel, policy, techniques, and procedures, and 
on the assignment, evaluation, promotion or advancement, 
and morale of officer and enlisted personnel; ...and to 
develop such psychological and educational tests and 
other instruments as may be necessary for the selection, 
classification, training, and evaluation of performance 
of Navy personnel, (3) 



3 

Future wars may not require such mass mobilization, and 
in any event will surely not allow sufficient time for the 
construction of tests which are valid enough to be used as a 
reliable guide for the selection, placement and training of 
personnel, 

^Stuit o£, cit. p.ll 



5 



This organization has had various titles over the last 
eighteen years and has made many recommendations for 
improving the Navy’s personnel administration,, Fund limit- 
ations, as well as opposition to change, has prevented full 
implementation of the program and its recommendations o 

C. SELECTED NAVY TESTS 

As a result of the groundwork laid during World War II , 

and the permanent organization established as the Personnel 

Research Division of the Bureau of Naval Personnel shortly 

after the war, many methods of mental measurement have been 

devised. This discussion will be limited to those tests 

considered basic with mention of other special Navy tests o 

E nlisted Basic Test Battery 

The General Classification Test (GCT) 

...is a 100-itera test designed to measure the ability 
to comprehend material of a verbal nature. The 40 
sentence-completion items and 60 verbal-analogy items 
which comprise the test are arranged in order of 
increasing difficulty. The testee is to select the 
one most correct answer from the five possible ansv:ers 
which are given. A time limit of 35 minutes is used. 
(5) 

This test, like all Navy tests designed for mental 
measurement, is standardized and differences can be readily 
established. Appendix A illustrates the comparison of the 



Development and Standardization of the U. S. Navy Basic 
Test Battery, Form 6, Bureau of Naval Personnel Research 
Report 56 - 2 . U, S. Naval Personnel Research Field Activity, 
San Diego, Calif. Nov. 195^, P« !• 



6 



- “ •. ■ . 





Navy*s standard T-scores with Z-scores, Stanines, and IQ 

scores. All of these different methods of scoring are based 

on a normal probability curve. 

The Arithmetic Test (ARI) is designed as, 

...two separately-timed subtests, a 20-item Arithmetic 
Computation Subtest and a 30-item Arithmetic Reasoning 
Subtest. Both kinds of items are in five alternative 
multiple choice form. (6) 

Time limits are also established for both of these 

subtests and are 12 and 35 minutes respectively. 

A Mechanical Test (Mech) is designed as, 

...two separately-timed 50-item subtests, Tool Knowledge 
and Mechanical Comprehension. . . .time limits are 10 
minutes for the Tool Knowledge Subtest and 25 minutes 
for the Mechanical Comprehension Subtest. 

Each tool knowledge item consists of five pictures of 
mechanical or electrical tools or equipment. The testee 
is to select from the last four objects pictured the one 
v;hich is most closely associated with the tool or 
object xn the f xrst pxcture 

Each mechanical-comprehension item consists of one or 
more drawings in which a mechanical problem is presented. 
The testee is to show vrhether he understands the 
mechanical principles involved by marking one of the three 
possible answers provided. (7) 

The fourth and last of the Navy’s Enlisted Basic Test 

Battery is the Clerical (CLER) which is designed to 

...measure the ability to observe quickly and accurately, 
consists of 240 pairs of five-to-nine-digit numbers which 
must be compared at a high rate of speed. The examinee 
indicates v/hether the two members of the pair are the 
same or are different by marking an or ”0” in the 
adjoining answer space. (B) 



^ Ibid. p. 2 
'^ Ibid. pp. 2-3 
^Ibid. pp. 3-4 



7 



other Navy Enlisted Tests 

In order to obtain supplementary information necessary 
for proper classification of Enlisted personnel several 
special tests have been divised. These tests include but 
are not limited to the following: (9) 

1. An Electronics Technicians Selection Test 

2« Radio Code Test 

3. Telephone Talker Test 

4- Sonar Pitch Memory Test 

5. Navy Literacy Test^^ 

6. Non-verbal Classification Test 

One other test, perhaps the most important of all Navy 
Enlisted Tests, is the Advanced Technicians Test. Because 
of the increasing complexity of today *s scientific and 
technological requirements more effective screening methods 
are provided for advanced technical training by the use of 
a new test. This test, The Advanced Technicians Test, 
consists of four parts, Reading Comprehension, Mathematics, 
Physics, and Electricity o 

The Advanced Technicians Test does not replace the basic 
test batteries. It is given to enlisted personnel in second 
or subsequent enlistments and results recorded on page 3 of 
the service record and on the Bureau of Naval Personnel 

^ Information and Education Manual, NavPers 16,963D, 

Bureau of Naval Personnel, Aug, 1955, P« 70, 

^^This test is designed for those v;ho cannot read English. 



Enlisted Master Tape. Tests are given at designated Enlisted 
Classification Units and retests are not authorized. Tests 
are to be administered to the folloviing categories of 
personnel not previously tested: (11) 

1. qualified submariners 

2. less than 12 years services 

3o all others who desire to be tested 
all first reenlistees 

5o all applicants for nuclear power training 
Basic Tests for Naval Officer Personnel 

The need for an intellectual screening device for officer 
personnel was extremely urgent owing to the necessity for a 
rapid expansion of the Navy early in Viforld V/ar II . In an 
effort to meet this urgent requirement j two groups of tests 
were developed in 1942 and 1943 o These tests v^ere to be 
given and used as a part of an initial screening of applicants 
and as a classification tool after applicants had been 
processed beyond this initial stage. 

Basis among the tests developed was the Officer 
Qualification Test. This test, still used with modifications g 
consisted of three parts-=vocabularyj mechanical comprehensiong 
and arithmetical reasoning. It was felt that independent 
verbal, mechanical, and arithmetical abilities were indicated 
by factorial analysis. 

Bureau of Naval Personnel Instruction 1236.2,20 June 19& 



9 



The vocabulary portion of the 100 question test 
consisted of 50 opposite items vrhere the testee was to 



select a word from among five that was nearest to being the 
opposite of a stimulus vo-ordo The Arithmetical portion of the 
test consisted of twenty questions having five choices for 
each question* The Mechanical Comprehension test completed 
the Qualification Battery* This subtest consisted of thirty 
items illustrating mechanical situations about which a 
question was asked and an anvswer chosen from three alternatives* 
Sixty minutes was allowed to complete the entire battery v/ith 
recommended times for each section* 

An Officer Glassification Test was developed to 
differientiate among officers in order that assignments could 
be made to specific duties with a minimum of misplacement* 

This test battery is composed of five sections as follows? (12) 
I, Verbal Reasoning Test 75 five-choice analysis items 

II, Mechanical Comprehension 4^ five^choice mechanical 



Test 



comprehension items 



III* Mathematics Test 



50 five-choice mathematics 
items 



IV, Relative Movement Test 



50 four-choice relative 
movements items 



V* Spatial Test 

A* Block Assembly 
B* Block Rotation 



30 four-choice block 
assembly items 
30 five-choice block 
rotation items 



12 



Stuit, o£* cit * p* lOS 



10 



CHAPTER II 



CURRENT USES OF TESTS 

Tests as mental measurement devices are currently 
enjoying widespread use« The majority of psychologists and 
many personnel administrators feel that mental testing has 
attained its majority and is moving tovrard a yet to be found 
maturity. Many others feel that mental testing has attained 
its long sought maturity and that the proper use of mental 
tests will benefit mankind immeasurably, A few professionals 
in the mental testing field feel that testing is still in 
its infancy and that test results furnish only a sample of 
individual capacities. These few professionals question 
mental testing and ask if mental tests do give a fair and 
effective measure of a person’s intelligence, aptitude, 
kmwledge or ability to think, 

A listing of tests currently in use with a brief 
description of each would fill several volumes. Even a list 
of companies which furnish testing services would be rather 
extensive. Hovrever, the five giants of this industry are 

(1) Educational Testing Service, Princeton, New Jersey, 

(2) Psychological Corporation, Now York, N, Y, , (3) Harcourt, 
Brace, and V/orld, Inc., (4) California Test Bureau, Los 
Angeles, Calif., and (5) Science Research Associates, Inc., 
Chicago, 111, 



11 



Ao MILITARY APPLICATION 



Each of the armed services has tests which attempt to 
measure intelligence, aptitude, knowledge, and ability to 
think. Tests used by the different armed services are 
similar and each service uses similar procedures, Hov/ever, 
since each service considers itself unique and therefore 
has unique requirements, each has its own tests. 

These tests are of the pencil and paper type and are 
considered to be an essential part of the selection and 
classification procedures as previously noted. Tests are 
usually given early in an individual’s military career. 
Results are tabulated mechanically and posted in the member’s 
Service Record, Re=tests are allov/ed if it can be shovm by 
the testee that he was under some severe handicap at the time 
of the original test. 

Tabulated test results are used to determine how and in 
what manner the newly inducted service members can make their 
greatest contribution to the mission of their particular 
service. 

Uses of Navy Enlisted Tests 

Test results are first used in the case of Naval 
enlisted personnel at Recruit Training Commands, During 
their period of recruit training but after the basic test 
battery has been scored and entered in their personnel 
record, they are interviev/ed individually by military 



12 




• I 



personnel who are alleged to be qualified in personnel 

placement work* At this interview' j, careful consideration is 

given to the individual’s personal preference, test scores, 

civilian work experience, motivation, previous training, and 

general interests. The recruit is then given his first 

13 

classification, ^ 

At this time recommendations for Class A training are 
made, altnough assignment to schools for all recruits 
classified as eligible is frequently impossible due to peak 
recruit inputs, service needs, and school capacities. 

Also in the past, recruits who scored low on their 
basic test battery were recommended for administrative 
discharge at this point inasmuch as they were deemed not 
capable of being trained to fill Mavy billets. However, this 
has been partially corrected by more adequate testing prior 
to enlistment which is designed to weed out individuals 
whose ability to read and write is suspect. 

The Navy’s Bureau of Naval Personnel has established 
minimum cutting scores on the basic test battery for many 
occupations requiring formal training. These minimum scores 
are widely disseminated throughout the Navy and individuals 
who have attained the required status as indicated by their 
scores are elegible to apply for formal training after they 



Classification interviews are occasionally carried out 
in a perfunctory manner, allowing less than ten minutes per 
interview. 



13 



have left recruit training. Individuals v;ho apply for 
Class A formal training after recruit training are subject to 
handicaps, but channels are available to overcome these 
impediments if the qualified person is persistent. 

One of the most severe handicaps is the constant shortage 
of on board personnel in commands afloat and ashore. This 
shortage has occasionally been justified, but more frequently 
it is the result of local provincialism. Whatever the reason, 
individual requests from qualified personnel frequently never 
leave their respective commands. 

Another handicap frequently encountered in the field is 
a basic misunderstanding as to xirhat test scores mean. For 
example, a department needs another striker, and a sailor is 
selected based upon rather general criteria. One of the 
most heavily weighted factors considered is his basic test 
scores. If the newly acquired striker learns his new job 
rapidly, he is considered to be a fine fellow and his test 
scores were an excellent predictor of his success. If the 
striker learned the job slowly or not at all, he is considered 
a smart never«dO“well and is promptly labeled as such. This 
label, which is informally spread throughout the coraraand, 
adheres to this person xvithout discrimination. He is seldom 
afforded another opportunity to strike for an occupation 
where his talents could be utilized. In the event this second 
hypothetical sailor has sufficient obligated service when a 



14 



must^quota for school is sent to the command j he will 
probably be made available ^ but in the meantime he may have 
become firmly convinced that the Navy is no place for him» 

The Bureau of Naval Personnel periodically issues 
Notices to field activities stating personnel requirements 
and requesting applicants for various schools o Each Notice 
or one of its references contains minimum test score 
qualifications for applicants » Continuing requirements are 
issued in the form of instructions » A limited number of 
instructions and contents are outlined below s 

Bureau of Naval Personnel Instruction 1510o69F 
This instruction concerns the Navy Enlisted Scientific 
Education Program (NESEP)„ The (NESEP) is an uninterrupted 
four^year college educational program lending to a 
baccalaureate degree in major fields approved by the Chiefs 
Bureau ox Naval Personnel# Upon graduation enlisted personnel 
are ordered to Newport, Rhode Island or elsewhere for Officer 
Candidate School, (OCS)* Upon successful completion of (OCS)^, 
students are commissioned in the Regular Navy# Eligibility 
requirements include a combined GCT and ARI score of llS# 

This ensures the Navy that the chances of an individual 
succeeding in this program are excellent# Other screening 
examinations are given and all necessary precautions are 
taken to ensure that applicants are properly motivated# 

Similar prerequisites are set forth in the Bureau of Naval 



15 



Personnel Notice 1531 of 28 June 1963 for military academy 
applicants « 

Bureau of Naval Personnel Instruction 1500 o 150 

This instruction concerns the selection and training of 
candidates for diving duty« The mental requirements for 
selection to this program are listed as desirable and consist 
of a combined ARI and MECH score of 105 o 

Similar prerequisites are established for ElectroniCj 
Clerical 5 and other occupations,, Prerequisites are geared to 
the level of skill deemed necessary to successfully function 
in the particular occupationo However j, Commanding Officers 
may request test score v;aivers in meritorious cases where it 
is believed that a candidate does possess the necessary 
capacity for training and that this capacity is not reflected 
in his test scores c 

Efforts by the Chiefs Bureau of Naval Personnel to 
establish minimum prerequisites necessary for an individual 
to attain proficiency in many areas have been extremely 
successful as can be judged by correlation studies between 
test scores and grades of students completing courses of 
study « However s it should be emphasized that these studies 
included only those students who graduated and did not 
include those who v/ere disenrolled for various reasons or 
given certificates of completion^ The prerequisites were 
established to reduce costs and to increase the level of 



16 



training in the Navy. It can be safely assumed that these 
two objectives have been partially met. 

Efforts have also been made by the Chief, Bureau of 
naval Personnel to enforce compliance with established 
prerequisites. Bureau of Naval Personnel Instruction 1510.7 
of 20 Oct. 1952 , v/hich is still in effect, noted that 
excessive numbers of ineligible candidates vrere being 
received at enlisted service schools. This instruction 
directed the attention of all commanding officers to the 
problem and further directed strict compliance with current 
directives. Failure to meet minimum basic battery test 
scores was listed as one of the most frequent errors causing 
candidates to be ineligible. 

Uses of Navy Officer Tests 

The tests designed for use in selecting and classifying 
officers were outlined in Chapter I. These devices v;ere 
used during V/orld War II. The Officer ''s Selection Battery 
served to screen out personnel who were not deemed to be of 
officer potential and was given to practically all officers. 
The Officer's Classification Battery was not administered 
to all officers and many officers have never taken this 
series of tests. A survey of ninety-four officers having 
from five to eighteen years service at the U. S. Navy 
Postgraduate School in 1962 revealed only twenty-one officers 



17 



1 / 

v;ho had taken these tests <, ^ An estimate of the number of 
officers v;ho have taken the classification battery is not 
available j but these tests have been regularly administered 
to all nev/ly commissioned officers since about 1951 o 

A search of publicationsj records, and regulations at 
the Uo So Naval Postgraduate School does not indicate that 
the results of the Officer ®s Classification Battery test 
scores are enjoying v;ide useo 

The Officer's Selection Battery is enjoying v;ide use„ 

All applicants for commissioned status are given this 
batteryo It is given at officer procurement centers and is 
given annually to inservice applicants o This test battery is 
an extremely basic instrument and furnishes little information 
in addition to that required for acceptance or rejectiono 



1 J 

Jo Martz and T. E<, Rushin, "Determination of Valid 
Criteria for Selecting Postgraduate Management School Candidates 
on the Basis of Established Academic Performance and Various 
Aptitude Tests" (Unpublished research paper, U» So Naval Post- 
graduate School, Monterey, 1962), Po llo 



18 



CHAPTER III 

LIMITATIONS OF TESTING 

The limitations of testing in determining mental level j, 
general aptitudes, and various personality characteristics 
of individuals are in the author'* s estimation presently 
uncountable. All of these limitations cannot be overcome in 
the foreseeable future. However, they may be reduced to a 
respectable level if due recognition is given to the facts, 

A limited search of the literature in the field of 
testing has not revealed a common concise definition of 
intelligence or intellect, although many testers and 
psychologists claim that this is iihat they are measuring. 

Then we can only conclude that intelligence is v;hat intelligence 
tests measure. If this definition of intelligence is 
accepted, no satisfactory test of ability to learn will ever 
be developed. Tests currently measure intelligence by the 
sample technique, i,e, a performance sample is taken under 
standardized conditions. This sample has actually been taken 
from the achievements of the individual. It has not measured 
his ability to learn which may be far above or far below his 
level of achievement as revealed by the sample. 

Another limitation of testing is the use of test results. 

An example previously given concerning the selection of a 
striker is one misuse that could be easily corrected. If 



19 



test results are treated as a final measure of ability or 
aptitude, then the test is being misused, because tests 
cannot furnish us with an absolute numerical measurement of 
the individual. 



A. TEST CONSTRUCTION 

Test construction is a long and arduous process, A 

decision must first be made concerning the purpose of the 

test, i.e. what abilities, proficiencies, or aptitudes are 

to be measured. In order to do this, the test maker must 

have a lcno\>rledge of the requirements of the particular 

functions vrhich the testees vrill perform. He must then 

analyze the component abilities, proficiencies, or aptitudes 

which are necessary to perform the stated function. The 

test maker then prepares a large number of questions, almost 

always of the multiple choice variety for intelligence test, 

to be used in the initial stage. Then a X'/eeding^out process 

begins. The test maker may reject many of the questions and 

rev/ord others at this stage. 

The surviving questions are then "pretested” on people 
comparable to those for xvhom the test is intended, and 
a statistical dossier is compiled for each question. If 
a question is answered correctly mainly by the "better” 
examinees it is a good question. If it is answered 
correctly mainly by the "poorer” ones it is a bad 
question. If a fair number of the "better" examinees 
favor one ansv/er and a comparable number favor another, 
the question is probably ambiguous,. If everyone gets 
it right, it is useless. And so on. 



20 



In the light of pretest statistics, still further 
questions are rejected or rewritten, and ultimately a 
rigorously screened version of the test emerges <> It is 
now ready to be given to the people for whom it was 
constructed. ...The test is given a preliminary try 
out and the results receive elaborate statistical 
anal3rsis.l5 

At the time when original construction begins the test 
maker decides what salient characteristics testees must 
possess in order to perform the job for v;hich the test is to 
be given. These characteristics may vary in quantity, but 
are usually small in number. Original questions are selected 
to measure each of these characteristics and hopefully 
through the above quoted procedure the finished test in its 
smooth form v;ill furnish the test user v/ith sufficient 
information which will allovj him to make a better personnel 
decision than he could have made without the test. 

In constructing the test, every possible aspect has 
been standardized. Standardized time, room temperature and 
lighting are desireable. Timing is considered particularly 
important inasmuch as this helps to weed out testees who are 
not familiar vdth the subject matter of questions and are 
slow in coming up v;ith an ansv/er. This also saves time on 
the part of the tester and the testee. 




Banesh Hoffman, "The Tyranny of Multiple-Choice Tests", 
Harper *s Magazine . CCXXII No. 1330 (March, 1961), p. 3^, 



21 



Another aspect of standardisation in tests is the ansvrerso 
Questions of the multiple choice variety usually request that 
the correct answer be chosen out of 3j 4 or 5 possible answers 
or that the best answer be chosen by the testeea When the 
test is pretested, standardized ansv;ers are selected by the 
test maker on the basis of the most successful examinees 
ansv/ers. Answers thus obtained are, of course, subjected to 
the most severe statistical analysis. Assurances can then 
be given vrithout reservation that the standardized answer for 
each question is significant at a particular level. Since 
vxe have predetermined ansvjers for questions, test grading is 
a very simple matter requiring no judgement. V/here large 
numbers of tests are involved, grading by machine is the 
least expensive and most accurate method of determining test 
scores. This is true of all "objective” type standardized 
tests. 

Objective type multiple choice tests ere generally 
thought to be of ver}'’ high caliber inasm.uch as the margin for 
human error has been largely rem.oved from well constructed 
tests. This is a possible error in test construction. 
Individuals vrho take tests can only ansv/er questions sub= 
jectively, i.e, v/ithin the fraraev/ork of their ovm experiences, 
achievements and judgement. Mass testing vrith predetermined 
answers accentuates previous experiences and achievements. 



22 



Little stress is placed on the judgement of the individual » 
Justification for this emphasis on experience and achievement 
is readily apparent. The individual with the desired level 
of intelligence or judgement V7ill have had experiences 
similar to others in society. Therefore his current judgement 
or reasoning ability is a direct result of his past achieve- 
ments and experiences. If he has no experience in an area 
being tested, he will be scored low by the machine, because 
he v;as not standard and didn’t produce standard ansv/ers. 

In some cases the testee is penalized for using 
judgement. An example of a sentence completion item from the 
Navy’s General Classification Test will reveal this,^^ 

A good sailor vdll the orders of his superior 

officer, 

(A) see 

(B) fear 

(C) question 

(D) obey 

(E) change 

It is presiimed that this question is no longer in use, 
if it vras ever used, but it is felt that it is representative 
of many completion, (choose the best answer), type questions. 
This question or a similar question is administered to 
recruits after a period of recruit training. During training, 
conformity in thought and actions is a desirable behavior 
pattern. Lectures laud the life of a good sailor and the 

^^Research Report 5^-2, loc , cit e 



23 



merits of obedience are extolled o Movies of the Mavy*s 

great accomplishments are shovm and to each order issued by 

a superior officer, good sailors have resounded vfith an 

”Aye, Aye, Sir” meaning that the good sailor has received 

the order, he comprehends the order, and he v;ill obey the 

order. The key vrord then becomes ’’obey", but is this the 

desired objective answer? Most sailors will undoubtedly choose 

"obey" to complete this sentence, A fev; will choose ansxfers 

B, C, and E because they don’t understand the question, or- 

they have psychological incapacities. But the sailor who is 

attempting to use judgement is at an impasse. He knows that 

a good sailor must receive an order, comprehend it, and then 

obey ite This sailor knows that the statement implies orders 

have been issued. Further he knows that to see or comprehend 

an order is an absolute prerequisite to obeying. Should a 

good sailor comprehend all orders given by superiors or 

should a good sailor obey orders received v^ithout question, 

whether he understands them or not? He may follow a logical 

sequence and give see as his answer or he may try to figure 

our vrhat ansv'er the test maker vrants and give obey. In 

either case he has been left far behind other testees and 

17 

may not finish the test in standard time. 



^"^The objective ansi'rer to this question is unknov/n. 



24 



Other examples of objective tj'-pe questions that might 
be deemed confusing, vague, misleading or ambiguous can be 
found in many tests in use today. The California Survey of 
Mental Maturity . Form I, is considered by many authorities 
to be a test of considerable merit. This is a multiple 
choice objective type test divided into language and non- 
language sections v/ith several subsections in each section. 

One subsection of the 3anguage section on page 5, left 
column, states: "In each 2 *ow, there is one picture that shows 

something v/hich is the opposite of the first picture, Mark 
its number, (Items 23=27 Question number 25 gives a 
picture of falling rain in a v/ooded area as a first picture 
and as its possible opposites, there are pictures of (1) 
an exploding stick of dynamite; (2) a geyser spewing into 
the air; (3) a v/ater fountain sprinkling water into the air; 
and (4) a mountain stream. The correct objective ansvrer to 
this question is number 4 possibly because the test makers 
thought that a mountain stream vias not violent, A non-random 
sample of seven testees of high intellect chose number 1 as 
the correct answer and all because the other three choices 
contained moving vmter. 

The language section of this test contains a question, 

number 2, considered to be more defective than number 25, For 

19 

this subsection, instructions tell us to: "Mark the number 

1 ^ 

\\'c W, Clark, et al, , Survey of Ment al Maturity Form 1 
(Los Angeles: California Test Bureau, ±959T~ p, 5» 

^^Ibid, p, 10 



25 



of the v;ord that moans the same or about the same as the 

first wordc" The first word listed is oppress o The possible 

choices are listed as; promise, imitate, crowd, and burden,, 

It was not deemed necessary to test this question » In this 

case, a review of a current English dictionary by the test 

makers vrould have given cause to remove the question from 

the testo Both to crovid and to burden are listed as correct 

20 

meanings for oppress. Hov/ever, the objective answer to 
this question is burden. 

For an excellent analysis of multiple choice questions 
with objective ansv;ers, readers are invited to consult 
The Tyranny of Testinp: by Banesh Hoffmann. Dr. Hoffmann has 
made a comprehensive study of testing and estimates that as 
many as 5 percent of the questions used in our best tests are 
defective. He has taken an analytical approach in his 
study that may help to improve testing. 

B. RELIABILITY OF TESTS 

It is often stated by test makers that a test cannot be 
valid unless it is realiable. Reliability, quite simply, 
refers to consistency of results. "In theory if an individual 
were to take a test three or four times he v/ould ansv/er each 
question the same way and vrould come up vrith the same score. 

T, Onions (ed.). The Oxford Universal Dictionary 
(Nev/ York: Rand McNally & Company, 1955), P» 1377 » 

2l 

Rossall J. Johnson, Personnel and Industrial Relations 
(Homevfood, 111.: Richard D. Irwin, Inc. , I960 ) , pp. 50“51« 



26 



This is ideal realiability and unforunately seldom happens o 
In fact, scores usually improve each time a test is taken 
and may improve considerably if the individual has taken other 
tests of the same type, even on subject material unrelated 
to that of the original test« This is the test-=retest method 
of measuring reliability and it is not often used for reasons 
vrhich are obvious from the above discussion^ Hov/ever, memory 
traces v;hich cause improvement each time a test is taken tend 
to fade xvith time and better reliability is found in using the 
test-retest method v/hen a time span is allow^ed betv/een tests o 
Since individuals are continually learning, the time span 
allov/s a person to acquire nex; knowledge xfnich interferes with 
our reliability test. For all of these reasons 5 the test- 
retest method is less than satisfactory, 

Tv;o other methods of testing for reliability commonly 
in use are the equivalent form and split halves methods, in 
the equivalent form, tv;o tests are developed of equal difficulty 
covering the sane subject. If these tests are identical, 
scores on the tests are identical. Since any tvxo questions 
are never identical, reliability must be estimated, but fair 
estimates can be obtained in this manner. The split halves 
method is a variation of the equivalent form method. In the 
latter method two tests are developed, vxhile in the former, a 
single test is split into txvo parts of equal difficulty and 



27 









mm 



L ' 




















the halves are measured against each other and then by 
statistical massage test reliability can be determined o The 
Navy uses both of these methods for de\»-eloping test 
reliability data by using each method singly or combining the 
two methods in some cases » 

C. VALIDITY OF TESTS 

Previously it has been noted that the test maker must 
determine what component abilities, proficiencies and 
aptitudes must be possessed to perform a given function » 

Tests are considered valid if the 3 '^ measure these components 
accurately, Hovrever, in addition, before tests are 
considered valid , it must be proven that the original analysis 
is correct, i,e« a test which measures verbal ability may 
have high validity in measuring verbal ability, but the same 
test m*ay have very low or zero validity i-/hen correlated 
V7ith job success. 

The ideal is seldom found auid tests are considered 
beneficial if a positive correlation exists. The greater the 
coefficient of correlation, the better the test. Frequently 
very lov; correlations are sufficient to \7eed out personnel 
v;ho are obviously not qualified to perform a given function. 

Determining test validity is an extreme I}'- difficult 
task. First a job analysis is necessary, then the criteria 
for success must be established. Once the criteria has been 
established, a grading or scale for assessing job success for 
each inaividual is necessary. Only then can the validity of 

a test be found by matching job success with test scores, 

2S 



iliKir 




CHAPTER IV 



NECESSITY FOR TEST IMPROVEMENT 

Testing as a means of mental measurement can be an 
excellent tool in our military arsenal it is not sufficient; 
to maintain this tool in a static state v;hen conditions and 
needs are dynamic. The present state of testing can be 
likened to a ship v/hich was built in 1944 and has been kept 
in an excellent state of repair. In many respects this ship 
can still fill a vital role in the Navy'^s mission just as 
testing assists in classifying, training, and placing personnel. 
Improvements have been made in old weapon systems and new 
ones, through research, have been developed. Old tests have 
been improved; at least statistics tell us that test 
reliability and validity is improving vrith each test revision. 

Ao CONSTRUCTION 

Test construction has previously been discussed in broad 
outline. Several defects have been pointed out and other 
defects implied. These defects in total, if Dr. Hoffmann’'s 
estimate can be accepted, would allow the less intelligent 
individual with superficial knov/ledge to obtain a rav; score 
five percent higher than his more intelligent contemporary, 
although the probability of an extreme of this sort is 
quite lov/ and vralvers can usually be obtained if an applicant 
for any program has persistence. However, the applicant's 



29 



Wf. 




convincing his 



persistence must not v/aiver \;hilc he is 
Division Petty Officers Division Officers Department Headj 
and Commanding Officer of his sincerityo 

Questions which are constructed using flawless grammer 
v/ith the best ansv/er requested from a choice of several 
alternatives are at best suspect when more than one correct 
choice may be interpreted* There is considerably certainty 
that we will obtain answers to questions of this type v/hich 
shov/ us a normal distribution, ioe, after the question has 
completed the cyclical test for reliability* This distribution 
is obtained through careful study of ansx*;ers given and answers 
certainly reflect the experience, education, and achievements, 
or lack of same factors, of persons ans^^rering the questions* 

The author has been unable to locate any relevant studies 
which attempt an analysis as to why distracter answers are 
chosen by testees taking multiple -choice objective type 
tests or for that matter why the objective answer is chosen 
by testees* It is felt that this information is an 
absolute necessity before questions of this type can be 
clearly evaluated and used as a measuring device* 

There is, of course, no excuse for constructing questions 
ivhich tend to mislead* This is a favorite method of many 
college professors who test for rote memorization* This type 
of question is not only incorrect, it is a discredit to the 



30 



intellect of man. In the category of questions v;hich tend 
to mislead j we must include all questions which are vague j 
ambiguous, and in general tasteless., Questions of the 
misleading variety may serve a valid purpose vihen xised by 
experts in individual testing, but their usefulness is 
marginal v;hen in group testing we attempt to measure the 
intellect of a particular individual j, although tests of this 
type are beneficial if it is desired to measure one®s 
ability to detect flaws in construction . 

There are very few questions of the types described 
above in use by military testers, but any is too many. These 
questions are a tX'^ofold detriment to sound testing because 
the testee must first determine among many variables what 
the question is requesting and then select an answer from 
several possible objective ansvrerso The total possibilities 
in a poorly constructed question can be astronomical in 
number. Perhaps probabilities could be assigned to each 
possibility, but this would bring the testee no closer to 
comprehending the question than before and his answer v;ould, 
largely, still be left to chance. 

B. VALIDITY 

It is presumed in this section that military tests are 
well constructed. They are highly reliable and reliability 
tests show a correlation coefficient of .BO or greater. 



31 



The concept of validity is crucial to any testing progranio 
If a test is perfectly valid 5 it has a correlation 
coefficient of plus 1, i»eg the level of performance of 
each worker is identical to his test score in relation to 
the group being tested. Perfect validation is illustrated 
in Appendix B, At the other extreme, a test may have a 
perfectly negative correlation coefficient of minus 1 x^rhere 
the individuals who obtain the lowest scores are the best 
workers. This is also illustrated in Appendix B, In the 
event there is no relationship betvreen test scores and work 
performance, a zero correlation coefficient, also illustrated 
in Appendix B, is said to exist. Tests with a zero correlation 
coefficient are considered to have little merit, while those 
having a positive or negative correlation can be used. 

However, in choosing workers by using a test having a negative 
correlation with job success, it must be remembered that 
low scores mean that the worker will be a success on the job 
for which the test was developed. 

The Navy's studies of test validity have been quite 
extensive vjithin a limited range. Available studies indicate 
that the area of coverage has been limited v;holly to 
academic performance. This has been necessary because the 
Navy has not yet developed an adequate system of rating 
officers and men in job performance outside the training 



32 



area. Some controversy erupts from time to time between 
pi"oponents of various rating methods however > but no system 
has been officially adopted vrhich even purports to solve 
this conumdrum. Therefore, since all military personnel 
receive some training, the criteria for success hinges on 
academic performance in training assignments, 

Stuit’s^^ studies of test validity show a positive 
correlation between scholastic achievement and test scores 
for most Navy tests used in classifying both officers and 
enlisted personnel in VJorld V/ar II, He succinctly points 
out instances of negative correlation, but these are small 
in number and can be disregarded. For the most part 
correlation coefficients fell in the range ,10 to ,70, Any 
coefficient above #60 is considered very high, 

A more recent study of test validity revealed that there 
vras a significant positive relationship bet’ween the (BTB) 
for enlisted personnel and final grades attained at class A 
and class P Navy schools, Various combination of the Basic 
Test Battery scores v;ere used in this study. These same 
test score combinations had previously been used in assigning 
personnel to school, 

22 

Stuit, o£, cit , , et passim , 

^^ Research Reoort 57°-! , NAVPER3 1^344A, Revised Editions, 
Personnel Measurement Research Branch, Personnel Analysis 
Division, Bureau of Naval Personnel, April 1957, £t passim . 



33 



In general, service schools in the Navy are under the 
control of the Bureau of Naval Personnel, Service schools 
are established as satellite commands in a larger complex, 
independent commands, and as school commands where schools 
of several types are established under one commanding officer. 
Training programs are established by the Bureau of Naval 
Personnel in conjunction v/ith a technical bureau having 
primary responsibility in the area concerned. The Navy’s 
need for training personnel is determined by the Bureau of 
Naval Personnel again in conjunction v;ith the technical 
bureau concerned. Quotas are established and personnel are 
selected and assigned to the various established schools. 

These assignments are based upon service needs, test scores, 
and individual preference. Training commands, at this 
point, have an approved training program and trainable 
students and these commands are expected to train and 
graduate men who are capable of performing technical service 
in today’s Navy of ever increasing complexity. 

There are indications that service schools labor under 
some handicaps in fulfilling their missions. Standards must 
be set as a goal for students. At the same time personnel 
requirements must be considered, so standards must not be 
too high to prevent the required number from completing 
training. Standards among schools training personnel for 



34 



the same technical specialty do not vary, but commanding 
officers v/ho apply these standards in too rigorous a manner 
may be subject to severe criticism. In situations v/here a 
school is training personnel v;ho are not meeting standard, 
grading may have to be revised according to the study quoted 
below. 

Validity studies made under circumstances where true 

performance is unknown are tenuous. In addition, school 

operating personnel are placed in a rather difficult 

position in meeting reuirements of quality and quantity. A 

paragraph from the study noted above does little to instill 

confidence in the Navy's studies of test validity. This 

paragraph is quoted as follov/s:^^ 

(Usually the validity coefficients presented for 
tv/o class "A” schools training men for the same 
ratings are of comparable magnitude. However, in 
a few cases there are wide disparities. In these 
cases, for the schools with the much lower validities, 
the grading system might well be reviewed, since 
criterion unreliability is one of the factors which 
often reduce the obtained validities of aptitude tests.) 

We must, of course, agree that criterion reliability 

is an absolute necessity if we are to obtain reasonably 

correct validity coefficients, but the line of action proposed 

here would only increase the validity coefficient and may not 

correct it at all. In a military complex, a review has many 



Ibid, 



p. 4 



35 



connotations and to single out a school and suggest that its 
grading system be revised because validity studies do not 
compare favorably is tantamount to censure* If some 
disparity does exist, an examination is certainly indicated;, 
but in checking for criterion reliability, we should be a 
bit more scientific and review the grading systems of all 
schools having the same mission* 

A study by Thorndike and Hagen of more than ten thousand 
men v;ho had previously taken military test batteries was 
completed and published in 1959* Several limitations were 
recognized by the authors of this study in reaching their 
conclusions on the validity of aptitude tests as a predictor 
of job success in civilian occupations* All men studied were 
gainfully employed in various jobs of their own choice* It 

is stated; 

"In general conclusion, ive must say that though it is 
possible that tests of aptitude can show validity in 
long-range predictions of occupational success when 
individuals are employed in jobs in vridely different 
parts of the country, our data give little evidence to 
encourage this belief*" 

Conclusions and results are succinctly stated as follows; 



R* L* Thorndike and 2* Hagen, 
(New York; John V/iley and Sons, Inc 



Ten Thousand Careers 

:9wrrv~ 



26 

Ibid* p. 50, 



36 



Our results showed that occupational groups 
differed with respect to personal background variables 
as well as with respect to aptitude test scores. It 
is hard to make a quantitative comparison between 
these two types of information, but our judgement would 
be that items of personal background differentiated 
about as sharply as did scores on aptitude tests. 

Once again, the patterns were, in most instances, 
sensible and in accord with what we would have expected 
by a priori analysis of the occupations. It is possible 
to rationalize most of the significant differences with 
some satisfaction. There were, of course, some 
differences that are difficult to rationalize, but 
these can, in many instances, be thought of as chance 
variations and onces that probably would not hold up 
in another sample. 

V/ith respect to prediction of success within an 
occupation, our conclusions must be quite different. 

As far as we were able to determine from our data, there 
is no convincing evidence that aptitude tests or 
biographical information of the type that was available 
to us can predict degree of success within an 
occupation insofar as this is represented in the 
criterion measures that we were able to obtain. This 
would suggest that we should view the long-range 
prediction of occupational success by aptitude tests 
with a good deal of skepticism and take a very restrained 
view as to how much can be accomplished in this direction. 
It is possible that data for a more heterogeneous group 
of anplicants would lead to different conclusions in 
this respect; however, our suspicion is that if the group 
had been more heterogeneous, our increased success would 
have shown up primarily in an increased sharpness of 
differentiation among occupations rather than in improved 
ability to predict within a single occupation. Certainly, 
if we had taken the whole range of abilities in the 
American population, the profile patterns would have 
become very much more clear-cut and the differences 
among occupations would have become a good deal more 
striking. Whether at the same time we would have developed 
some success at predicting degrees of achievement within 
an occupation seems very much open to question. 



37 



The group involved in this study v;as limited to 
former Army Air Force Cadets „ Tests used v/ere of the general 
type previously described as being administered to officer 
personnel for the purpose of classification c It is felt 
that the results, as outlined by Thorndike and Hagen, speak 
for themselves and the subject requires no further comment 
at this point. 



33 




% 



CHAPTER V 



TOOLS FOR THE FUTURE 

The United States Navy and other military services 
have throughout the history of the United States served 
their country well in both v;ar and peace „ However, never 
before have the military services been called upon to 
prepare for instantaneous defense of their nation on a 
global scale o This calling has necessitated a peace-'time 
build-up of men and materials beyond the comprehension of 
our civilian and military leaders in World War II „ 

In an effoii: to minimize the cost of the defense effort, 
thus lessening the military drain on the National Economy, 
civilian and military leaders have concentrated their efforts 
on the spectacular, ioGo areas of high dollar costo Efforts 
in these areas have certainly given us more bang for the 
buck„ The art of Operational Analysis has been introduced 
and promises to be extremely useful in lowering costs and 
increasing efficiency. All nevr, as well as old, projects 
are scrutinized to determine if they permit optimum use; 
reduce costs; have sufficiently lov; costs; increase the 
speed of; are capable of; pi’omote and conserve; are 
compatible with; maximize output; and a myriad of other 
catch phrases meaning the same thing-^get the most for the 
military dollar. 



39 



Billions of dollars have been allocated and spent for 
research and development of v/eapons and systems v/hich 
are deemed necessary for national defense* Many more billions 
have been spent maintaining and operating these v/eapons 
and systems. 

In FY 1962 5 the Navy spent approximately 2.7 billion 
dollars on military manpov/er, A very small fraction of this 
amount v/as allocated to personnel utilization research. 

¥.e have definitely increased our repertory of tools necessary 
for the future, but, to a large extent, the tools necessary 
for the proper utilization of manpower have yet to be 
fabricated, 

A, RESEARCH REQUIRED 

Much research has already been accomplished, but our 
knowledge of man is extremely limited. The general educational 
level of a person can be obtained by a simple pencil and paper 
test, but our knowledge of individual capacities must be 
increased and put to use. For example, aptitude is defined 
as: condition or set of characteristics regarded as 



27Dollar costs for this program were not available in 
the Office of the Navy Comproller or in the Bureau of Naval 
Personnel, It is presumed that information of this type 
vrould be extremely difficult to obtain with the accounting 
system currently in use. 

‘‘^E, L. Hartley and R, E, Hartley, Outside Readings in 
Psychology , (Nex'/ York: Thomas Y, Crowell Company , 1957 
p, 274o 



40 



symptOKTiatic of an individual ’s ability to acquire v/ith 
training some (usually specified) knowledge, skill, or set 
of responses, such as ability to speak a language, to produce 
music o V . 

This is a broad definition and probably fairly accurate 
because it is a v/ide^angle approach to appPitude, It is to 
be noted that knov;ledge in a specified area is not a necessary 
prereouisite to being trained in that area„ According to 
many learning theorists, learning is accomplished most 
rapidly X'rhen there is no interference from already acquired 
knowledge . 

The Navy's test for mechanical aptitude serves to 
illustrate that there may be little relationship betxvesn 
previously acquired mechanical experience and an aptitude for 
learning mechanical skills » Validity studies for this test 
normally reveal low correlation coefficients because v;e do 
not know what characteristics or abilities are required to 
learn a mechanical skill. According to the study by Thorndike 
and Hagen, previously cuoted, backgrounds differentiated 
between occupati('»ns as sharply as did aptitude test scores. 
This then appears to be an area that requires considerable 
basic and applied research. 

It is not felt that research of the type alluded to in 
the previous paragraph should be performed I'^ithin the military 



41 



establishment inasmuch as personnel v'ho are in ratings , 
specialities or occupations at present are probably not 
representative of a population vfhich seeks its ovm level in 
society. Most military enlisted billets are filled by 
personnel who v/ere considered trainable in a particular 
speciality at an early stage of their military service by 
virtue of their test scores. Studies of this group have 
vindicated past procedures and v;ill certainly do so in the 
future, but vrill furnish little usable data. Many military 
specialities are, of course, not found in use in the 
civilian economy nor will a military environment be frequently 
found, but these superficial handicaps xvill for practical 
purposes disappear xvhen they are carefully examined. 

B. SELECTION FOR TRAINING 

Chapter II briefly outlined the manner in x^rhich tests 
are currently used for selecting enlisted personnel for 
training. At that point it was noted that the Officer 
Classification Battery (OCB) xfas not enjoying vride use as a 
selection device. It xvas further shovm in Chapter IV that 
the (OCB), in a study by Thorndike and Hagen, may have little 
validity as a predicter of what occupation v/ill be chosen 
by the individual and less validity as a predictor of 
success , 



42 



The Superintendent, United States Naval Postgraduate 
School, by inference, agrees that the (OCB) is an instrument 
of limited usefulness « In a letter, Serj 2166 dated 2 Aug 1963 
to the Chief of Naval Personnel, the Superintendent set forth 
his recommended guidelines for the Postgraduate Selection 
Board’s use in selecting students for pcstscraduate study 
during academic year 1964“1965« These recommendations were 
straight-forward and pertinent , but there vras no mention of 
the Officer Classification Battery^ 

Due to the diverse backgrounds of the several thousand 
officers considered for postgraduate study, some common 
attribute that could be used as a predictor of academic success 
was needed. This vras essentially revolved by considering the 
officer’s background as reflected in his personnel record on 
file in the Bureau of Naval Personnel, Each officer had on 
file fitness reports from which the Selection Board could 
determine the level of his past performance, for the most 
part, in non-academic assignments. The Selection Board also 
had available academic transcripts of undergraduate education 
from several hundred colleges and universities. The criterion 
for assigning grades in many of these schools was unkno’-Mio 
The direct cost of selection by this method is not insignificant 
and the opportunity costs can be appalling. 

This is the largest institution of its type in the world 
and its primary mission is the postgraduate education of Naval 
Officers, 



43 



Economics of Testinp; 



Tests used by the military services as entrance 
screening devices in peace-time save the tax-payers from an 
unnecessary burden in tv/o ways* First, monies are not 
vfasted in attempting to train personnel viho do not have the 
requisite capacities for military service and second the 
total efficiency of the military organization is intreased by 
eliminating the possibility of non»trainable personnel acting 
as a drag in an otherwise smooth-running organization* 
Entrance standards have been lov/ in the past and perhaps 
will be lower in the future if the military services are 
required to enlist and train the masses of unemployable * 
Hov/ever, the military services with the exception of the Army 
have been able to screen out most of the untrainables prior 
to enlistment. 

This paper is principally concerned v;ith what happens 
after enlistment or commissioning since costs prior to this 
time are insignificant as far as tests are concerned. Pay 
and allowances with variations for promotions, transfers, etc. 
are relatively fixed and can be roughly consider^ed as sunk 
costs for the duration of an enlistment or tour of active 
duty. 

The extent of testing officer and enlisted personnel 
on active duty essentially depends upon the time available 



44 



for testing and the costs involved in testingg Hov/ever^ the 
economic benefits to be derived hinge directly upon the 
validity of the tests* If test validity is zero or positive 
v/ith a lo\i degree of confidence ^ the use of tests is not 
economically feasible because of the expense involved in 
testing* 

The current procedure for selecting Naval Officers for 
Postgraduate education is an example of the inadequacies and 
diseconomies of tests as mental measurement devices* It is 
evidently felt by military authorities that the use of the 
(OCB) as a decision making tool would give rise to more 
wrong decisions than correct decisions* If a test does this^ 
it is an economic burden* Several studies have been done by 
Naval Management students on the validity of the mathematical 
and verbal portions of the (OCB) as a predictor of success 
in the Management curriculum* Appendix C illustrates in 
plotted form the results of one such study* It can easily 
be seen that the validity is near zero* These tests may be 
valid as a predictor of success in other areas ^ but they can^ 
not be justified economically as a tool for selecting 
management students* 

Ideally, if \ire have one thousand nexv inductees and one 
thousand billets to fill, tests of intelligence, aptitude and 
abilities, v;ith perfect validity, vrould allow these officers 



45 



and men to be placed in assignments consistent with their 
qualifications. This in turn v/ould raise efficiency’s ioSo 
output per man, and billets could be deleted in direct 
proportion to increased efficiency inasmuch as only a given 
level of output is required or can be economically tolerated 
for defense. A testing program of this magnitude is 
difficult to comprehend and perhaps not realistic v;hen costs 
are considered, but if, through testing and the proper place- 
ment of personnel, we could achieve a one percent increase in 
military manpower efficiency in FI 1964s a reduction in total 
manpoirer requirements v;ould save more than $120,000,000 v/hile 
maintaining the same output. 

The military services are constantly striving to increase 
the effectiveness of their weapons at the lov/est possible 
cost. Historically, manpov;er has been the most effective 
x>reapon possessed by any nation involved in conflict. Manpower 
must be considered as our most effective weapon in any future 
conflicts, but v/ars can not be won in the modern age if we 
use our resources in a haphazard manner. Testing assists in 
the proper utilization of human resources and can become a 
more valuable tool in the future. 

Any tool such as testing, can be misused and result in 
diseconomies \7hich are reflected in exorbitant opportunity 
costs. These costs arise in several ;;ays, but the basic 



46 



misuses occur from treating test results as an absolute 
inciica.tor i/hen in reality they should be considered as a 
sample vihich does not reflect drive 5 motivation 5 interests, 
or even aptitudes clearlyo Another misuse v;hich clearly 
results in opportunity cost is to ignore test results when 
they should be used* 

Testing as a tool for the future presupposes that the 
military services v;ill train personnel of the highest 
calibre in Personnel Management in order that decisions 
involving personnel classification, placement, training, and 
assignment will be made \^hich reflect service needs, 
personal needs on the part of members, and economies in 
management » 



47 



CHAPTER VI 



CONCLUSIONS AND RECOMENDATIONS 
A. CONCLUSIONS 

The military services are using paper and pencil tests 
to measure intelligence, aptitudes, and achievements o These 
tests are contributing to the efficient utilization of military 
manpower o Each of the military services, in their testing 
programs, presuppose -unique personnel requirements <> This is 
difficult to fathom except for isolated occupations o 

Testing, for the purpose of mental measurement, has not 
reached its maturity and much basic research is required <> 

If fact, testing for military use is, at best, in its 
infancy. Efforts to expand the frontiers of knowledge have 
been tenuous and narrow in military testing. Improvements 
have been made in the (BTB) for Naval enlisted personnel, but 
validity studies indicate a need for better instruments. 

Testing in its present state is a sampling device 
x-zhich cannot be used effectively v-fithout considering back“ 
ground factors, drive, and motivation in the assignment of 
personnel. The consideration of background factors, drive, 
and motivation has not been significant in the placement, of 
Naval enlisted personnel while these have been the only 
factors considered when Naval officer personnel are selected 
for advanced or postgraduate training. 



Many defects in our current testing program exist 
because \^e originally made a cursory examination of the 
occupations for which the tests v;ere created. The expediency 
required by v/ar does not justify this superficial approach 
to job analysis during peacetime. 

Test validity studies justify the cost of testing Naval 
enlisted personnel if the studies themselves can be accepted 
as valid, but little is known about the relationship between 
job performance and test scores, although much v/orthvrhile 
information has been gained from studies of the relationship 
between non-performance and test scores. 

Lastly, it is concluded that testing in the military 
services is a necessary and important part of Military 
Personnel Management . 

Bo RECOMNDATIONS 

The following recoirmiendations are made subject to 
revision as nev; and/or more reliable data becomes available. 
1. The present military testing program should be 
continued. Hox^rever, those tests not deemed sufficiently 
valid to be used should be discontinued immediately. 

2o Studies should be initiated by the Department of 
Defense to determine if the four military services do 
have unique personnel requirements. 



49 



3e An office should be established at the Department of 
Defence level to coordinate and evaluate an intensive 
research program* This program should be directed 
toward occupational selection by individuals with an 
advanced goal of success prediction* 

4* A job analysis for each billet should be commenced 
by the military services and coordinated by the 
Department of Defense* 

5* Training programs should be initiated by each military 
service to train all personnel v;ho make personnel 
decisions in the uses and limitations of test scores* 

6* Classification centers of each of the military 
services should be staffed by personnel thoroughly 
trained in eliciting background information from 
individuals being interviewed as well as ascertaining 
their motivations, drives, and ambitions* Centers 
should be staffed with sufficient numbers of such 
personnel to allow a minimum of one hour for each 
interviev/* Personnel being classified should bo given 
a definite or a conditional classification* Personnel 
v;ho have been given a conditional classification should 
be interviev/ed again, at a classification center, at 
the end of one year and given a definite classification* 

7. Each of the military services must develop a 



50 



performance rating system that will enable reviewing 
autiioritioc to evaluate performance by the degree of 
job success, 

Bo The Department of Defense should request in the 
next military budget monies for manpov;er utilization 
research, 

9o Tests in current use should be validated as soon as 
job analyses are complete and job performance 
evaluations are available, 

10, The Department of Defense should plan and coordinate 
the entire program as previously outlined in brief and 
a standardized military testing iDrogram should be 
developed at this level as soon as economically feasible. 



51 



BIBLIOGRAPHY 



Anastasi, Anne, Psvcholop:ical Testing o Nev/ York; The 
Macmillan Company, 195 4 » 

Barzun, Jacques. The House of Intellect . New York; Harper 
& Brothers, Fublisliers, 

Clark, W. W. , et. Survey of Mental Mat'aritv Form I, 

Los Angeles; California Test Bureau, 1959. 

Cronback, Lee J, and Goldine C. Glaser. Psychological Tests 

and Personnel Decisions . Urbana; University of Illinois 
Press, 1957. 

Freeman, Frank S. Theory and Practice of Psychological 

Testing . New York; Holt, Rinehart and V/inston, 1962. 

Hartley, Eugene L. and Ruth E, Hartley. Outside Readings In 
Psychology . New York: Thomas Y. Crowell Company, 1957. 

Hoffmann, Banesh. The Tyranny of Testing . New York; The 
Crov;ell-Collier Press, 1963. 

Johnson, Rossall J. Personnel and Industrial Relations . 
Homexfood: Richard D. In-rin, Inc,., I960, 

Onions, C. T. (ed. ) The Oxford Universal Dictionary . New 
York; Rand McNally & Company, 1955. 

Stuit, Dev/ey B. (ed. ) Personnel Research and Test Development 
in the Bureau of Naval Personnel . Princeton; Princeton 
University Press, 1947. 

Thorndike, Robert L, and Elizabeth Hagen, Ten Thousand 
Careers . Nev; York: John h’iley o: Sons, Inc., 1959. 

Thorndike, Robert L. and Elizabeth Hagen, Measurement and 
Evaluation in Psychology and Education . Nevf York: 

John Wiley & Sons, Inc., 1955. 

Wechsler, David. The Range of Human Capacities . Baltimore: 
The V/illiams So V/ilkins Company, 1952. 

Hoffman, Banesh, "The Tyranny of Multi ple^Choice Tests", 
Harper” s Magazine . CCXXII No. 1330 (March, 1961) 



52 



Martz, Do J. and T» S. Ruskino "Determination of Valid 

Criteria for Selecting Postgraduate Management School 
Candidates on the Basis of Established Academic 
Performance and Various Aptitude Tests" U» So Naval 
Postgraduate School , Monterey ^ California, 1962 « 
(Unpublished ) 

Bureau of Naval Personnel o Information and Education Manual o 
NavPers 16,9630, August 1955 <> 

Uo So Naval Personnel Research Field Activity, Bureau of 

Naval Personnel Research Report 56^2 , Development and 
Standardization of the U, S, Navy Basic Test Battery, 

San Diego, November 195£>o 

Personnel Measurement Research Branch, Personnel Analysis 
Division, Research Report 57-1 , Bureau of Naval 
Personnel, NavPers 183 44 A, Revised Edition, April 1957 « 



53 



APPEIIDIX A 



NORI^iAL CURVE 




13*59^ 2.l4^ . 15 ^ Area Under Curve 



20 50 4 o 50 60 

2 $ 16^ 50^ 



70 80 



Standard Deviations 



*Navy Standard 

Scores (T-Scores) 

**Cumulative 

Percentage 



The range of Navy Standard Scores theoretically extend 
from zero to one hundred, but the probcibility of a score 
being belov; 20 or above 80 is »13/a and scores in the extreme 
ranges is not included 

Percentage figures indicate \7hat part of population falls 
belox'/ a given point. 



54 



job success job success job success 



APPENDIX B 




Perfect positive correlation 




1.0 



,8 ■ 
.6 • 
.k . 
.2 ■ 














0 



.2 .4 .6 .8 1.0 



Test score 



Perfect negative correlation 



Zero correlation 



55 



APPENDIX C 



1962 Navy Management School Class 




OCB MATH VERBAL Test Scores 
2 



56 




1 







i 

1 



I 




A iiudv o( lh« uielulncit «nd hmiiation 




3 2768 001 93014 2 

DUDLEY KNOX UBRARY 




